随着计算机视觉应用程序的最新增长,尚未探索它们的公平和公正性问题。有大量证据表明,训练数据中存在的偏差反映在模型中,甚至放大。图像数据集的许多以前的方法偏见,包括基于增强数据集的模型,在计算上实现的计算昂贵。在这项研究中,我们提出了一个快速有效的模型,以通过重建并最大程度地减少预期变量之间的统计依赖性来消除图像数据集。我们的体系结构包括重建图像的U-NET,并结合了预先训练的分类器,该分类器会惩罚目标属性和受保护属性之间的统计依赖性。我们在Celeba数据集上评估了我们提出的模型,将结果与最先进的偏见方法进行比较,并证明该模型实现了有希望的公平性 - 精确性组合。
translated by 谷歌翻译
Starcraft II多代理挑战(SMAC)被创建为合作多代理增强学习(MARL)的具有挑战性的基准问题。 SMAC专注于星际争霸微管理的问题,并假设每个单元都由独立行动并仅具有本地信息的学习代理人单独控制;假定通过分散执行(CTDE)进行集中培训。为了在SMAC中表现良好,MARL算法必须处理多机构信贷分配和联合行动评估的双重问题。本文介绍了一种新的体系结构Transmix,这是一个基于变压器的联合行动值混合网络,与其他最先进的合作MARL解决方案相比,我们显示出高效且可扩展的。 Transmix利用变形金刚学习更丰富的混合功能的能力来结合代理的个人价值函数。它与以前的SMAC场景上的工作相当,并且在困难场景上胜过其他技术,以及被高斯噪音损坏的场景以模拟战争的雾。
translated by 谷歌翻译
本文提出了一种新的方法,可以从一组代理的行为痕迹中预测团队绩效。这个时空预测问题与体育分析挑战(例如教练和对手建模)非常相关。我们证明了我们提出的模型,空间时间图卷积网络(ST-GCN),优于其他分类技术,可以从玩家运动和游戏功能的短段预测游戏得分。我们提出的架构使用图形卷积网络来捕获团队成员和封闭式复发单元之间的空间关系,以分析动态运动信息。进行了消融评估,以证明我们体系结构各个方面的贡献。
translated by 谷歌翻译
Machine learning algorithms have revolutionized different fields, including natural language processing, computer vision, signal processing, and medical data processing. Despite the excellent capabilities of machine learning algorithms in various tasks and areas, the performance of these models mainly deteriorates when there is a shift in the test and training data distributions. This gap occurs due to the violation of the fundamental assumption that the training and test data are independent and identically distributed (i.i.d). In real-world scenarios where collecting data from all possible domains for training is costly and even impossible, the i.i.d assumption can hardly be satisfied. The problem is even more severe in the case of medical images and signals because it requires either expensive equipment or a meticulous experimentation setup to collect data, even for a single domain. Additionally, the decrease in performance may have severe consequences in the analysis of medical records. As a result of such problems, the ability to generalize and adapt under distribution shifts (domain generalization (DG) and domain adaptation (DA)) is essential for the analysis of medical data. This paper provides the first systematic review of DG and DA on functional brain signals to fill the gap of the absence of a comprehensive study in this era. We provide detailed explanations and categorizations of datasets, approaches, and architectures used in DG and DA on functional brain images. We further address the attention-worthy future tracks in this field.
translated by 谷歌翻译
众所周知,歌曲和诗歌的翻译不仅破坏节奏和押韵模式,而且导致语义信息丢失。 Bhagavad Gita是一个古老的印度教哲学文本,最初是梵语,在Mahabharata战争之前,克里希纳和阿尔纳之间的谈话具有谈话。 Bhagavad Gita也是印度教的关键神圣文本之一,被称为印度教的吠陀语料库的最前沿。在过去的两个世纪里,西方学者对印度教哲学有很多兴趣,因此Bhagavad Gita已经翻译了多种语言。但是,没有多少工作验证了英语翻译的质量。最近由深度学习提供的语言模型的进展,不仅能够翻译,而是更好地了解语言和语义和情感分析。我们的作品受到深入学习方法供电的语言模型的最新进展。在本文中,我们使用语义和情绪分析比较Bhagavad Gita的选定翻译(主要来自梵语到英语)。我们使用手工标记的情绪数据集进行调整,用于调整已知为\ Textit的最先进的基于深度学习的语言模型{来自变压器的双向编码器表示}(BERT)。我们使用小说嵌入模型来为跨翻译的选定章节和经文提供语义分析。最后,我们使用上述模型进行情绪和语义分析,并提供结果可视化。我们的结果表明,虽然各自的Bhagavad Gita翻译中的风格和词汇量广泛变化,但情绪分析和语义相似性表明,传达的消息在整个翻译中大多相似。
translated by 谷歌翻译
用于3D人类传感的最新技术的进展目前受到3D地面真理的缺乏视觉数据集的限制,包括多个人,运动,在现实世界环境中运行,具有复杂的照明或遮挡,并且可能观察到移动相机。复杂的场景理解需要估计人类的姿势和形状以及手势,朝着最终将有用的度量和行为信号与自由视点相结合的表示来估计的表示。为了维持进步,我们建立了一个大型的照片 - 现实数据集,人类空间(HSPACE),用于复杂的合成室内和室外环境中的动画人。我们将百种不同的年龄,性别,比例和种族相结合,以及数百个动作和场景,以及身体形状的参数变化(总共1,600种不同的人类),以产生初始数据集超过100万帧。人类的动画是通过拟合表达的人体模型,以单身扫描人们来获得,其次是新的重新定位和定位程序,支持穿着人的人类的现实动画,身体比例的统计变化,以及联合一致的场景放置多个移动的人。资产在规模上自动生成,并与现有的实时渲染和游戏引擎兼容。具有评估服务器的数据集将可用于研究。我们的大规模分析了合成数据的影响,与实际数据和弱监管有关,强调了持续质量改进和限制了这种实际设置,与模型容量增加的实际设定的相当大的潜力。
translated by 谷歌翻译
视觉变压器(VIV)被涌现为图像识别的最先进的架构。虽然最近的研究表明,VITS比卷积对应物更强大,但我们的实验发现,VITS过度依赖于局部特征(例如,滋扰和质地),并且不能充分使用全局背景(例如,形状和结构)。因此,VIT不能概括到分销,现实世界数据。为了解决这一缺陷,我们通过添加由矢量量化编码器产生的离散令牌来向Vit的输入层提出简单有效的架构修改。与标准的连续像素令牌不同,离散令牌在小扰动下不变,并且单独包含较少的信息,这促进了VITS学习不变的全局信息。实验结果表明,在七种想象中的鲁棒性基准中增加了四个架构变体上的离散表示,在七个想象中心坚固的基准中加强了高达12%的鲁棒性,同时保持了在想象成上的性能。
translated by 谷歌翻译
This paper introduces a video dataset of spatiotemporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips.AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.6% mAP, underscoring the need for developing new approaches for video understanding.
translated by 谷歌翻译
Deep neural networks coupled with fast simulation and improved computation have led to recent successes in the field of reinforcement learning (RL). However, most current RL-based approaches fail to generalize since: (a) the gap between simulation and real world is so large that policy-learning approaches fail to transfer; (b) even if policy learning is done in real world, the data scarcity leads to failed generalization from training to test scenarios (e.g., due to different friction or object masses). Inspired from H ∞ control methods, we note that both modeling errors and differences in training and test scenarios can be viewed as extra forces/disturbances in the system. This paper proposes the idea of robust adversarial reinforcement learning (RARL), where we train an agent to operate in the presence of a destabilizing adversary that applies disturbance forces to the system. The jointly trained adversary is reinforced -that is, it learns an optimal destabilization policy. We formulate the policy learning as a zero-sum, minimax objective function. Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah, Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a) improves training stability; (b) is robust to differences in training/test conditions; and c) outperform the baseline even in the absence of the adversary.
translated by 谷歌翻译